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Abstract — We study fountain codes transmitted over the 
binary-input symmetric-output channel. For channels with small 
capacity, receivers needs to collects many channel outputs to 
recover information bits. Since a collected channel output yields 
a check node in the decoding Tanner graph, the channel with 
small capacity leads to large decoding complexity. In this paper, 
we introduce a novel fountain coding scheme with non-binary 
LDPC codes. The decoding complexity of the proposed fountain 
code does not depend on the channel. Numerical experiments 
show that the proposed codes exhibit better performance than 
conventional fountain codes, especially for small number of 
information bits. 

Index Terms — fountain codes, rateless codes, non-binary LDPC 
codes 

I. Introduction 

Fountain codes are a class of erasure-recovering or error- 
correcting codes which produce limitless sequence of encoded 
bits from k information bits so that receivers can recover the 
k information bits from any (1 + e)k/C encoded bits, where 
C is the channel capacity and e is referred to as overhead. The 
name is after water fountains which endlessly produce water 
drops to entertain people. Designing fountain codes with small 
overhead is desirable. LT codes [1] and Raptor codes [2| are 
fountain codes which achieves vanishing overhead e -> in 
the limit of large information size over the channel with C = 
1, i.e., the binary erasure channel (BEC). By a nice analogy 
between the BEC and the packet erasure channel, fountain 
codes successfully adopted by several industry standards. 

In f3 1, Etesami et al. investigated Raptor codes used over the 
memoryless binary-input output-symmetric (MBIOS) chan- 
nels. And they showed that over the AWGN channels with 
capacity C > 0.49, Raptor codes achieve overhead e < 0.08 
at BER 10^^ with information size k = 65536. A Raptor 
code can be viewed as concatenation of an outer high-rate 
LDPC code and infinitely many single parity-check codes of 
length d, where d is chosen randomly with probability fid 
for d > 1. In 14], Venkiah et al. proposed a joint decoding of 
the concatenated codes and an optimization method for output 
degree distributions fl{x) := J2d>i ^d.x'^ and showed that the 
optimized codes outperform the conventional ones. 

The problems for constructing fountain codes used for 
general channels with finite inputs are summarized as follows. 

• Problem 1: The output degree distribution n{x) needs to 
be optimized for each k. And large check node degree d 
leads to the large encoding and decoding complexity. 

• Problem 2: The number of check nodes in the inner codes 
is given by (1 + ejk/C. This increases as the channel 



capacity C decreases. Since check node computation is 
dominant in decoding, the decoding complexity is high 
for small C. 
• Problem 3: Large size of information and vanishing 
overhead are often considered. This leads to large size 
of memory devices and transmission latency. 
In this paper, we will propose a novel fountain coding scheme 
which is free of those drawbacks. 

In this paper, we consider non-binary LDPC codes defined 
by sparse parity-check matrices over GF(2™) for 2™ > 2. 
Non-binary LDPC codes are invented by Gallager |l5| and, 
Davey and MacKay ||6l found non-binary LDPC codes can 
outperform binary ones. Non-binary LDPC codes have cap- 
tured much attention recently due to their decoding perfor- 
mance. 

It is known that the irregularity of Tanner graphs helps 
improve the decoding performance of binary LDPC codes. 
While, it is not the case for the non-binary LDPC codes. 
Interestingly, the (2, dc) -regular non-binary LDPC codes over 
GF(2™) are empirically known [7j as the best performing 
codes for 2™ > 64, especially for short code length. This 
means that, for designing non-binary LDPC codes, one does 
not need to optimize the degree distributions of Tanner graphs, 
since (2, dc) -regular non-binary LDPC codes are best. Fur- 
thermore, the sparsity of (2, dc) -regular Tanner graph makes 
efficient decoding possible. 

II. Fountain Coding with Multiplicatively 
Repeated Non-Binary LDPC Codes 

In this section we explain a new fountain coding scheme. 
The new coding scheme uses a non-binary LDPC code as a 
pre-code. 

In |8j, the authors presented low-rate non-binary codes. The 
code is a concatenation of (2, 3)-regular non-binary LDPC 
code and inner multiplicative repetition codes. In general, low- 
rate LDPC codes have many check nodes and suffer from 
the high decoding complexity than hight rate codes. One 
of the remarkable features of the code is that the decoding 
complexity does not depend on the coding rate. The code 
exhibits excellent decoding performance for small code length 
and is rate-compatible. We will use the low-rate code 1 8 1 with 
vanishing rate as a fountain code. 

We fix a Galois field GF(2™) with a primitive element a 
and its primitive polynomial 7r(a;). Once the primitive element 
is fixed, one can represent each symbol in the Galois field as a 
binary sequence of length m |9|. For example, with a primitive 




Fig. 1. An example of a pre-code Ci. A non-binary (2,3)-regular LDPC 
code of rate 1/3 over GF(2'"). Each variable node represents a symbol in 
GF(2'"). Each check node represents a parity-check equation over GF(2'"). 
The code length is 18 symbols in OF (2"*) or equivalently 18m bits. 



element a G GF(2'^) such that 7r(a) = a^ + a + I = 0, each 
symbol is represented as = (0,0,0), 1 = (1,0,0), a = 
(0,1,0), a^ = (0,0,1), a^ = (1,1,0), a^ = (0,1,1), a'' = 
(1,1,1) and a^ — (1,0,1). In this setting, k information bits 
can be represented as k/m symbol sequence (xi, . . . , Xk/m) G 
GF(2"')'^/™. Note that what corresponds to a packet used in 
the typical fountain coding system is not the sequence but 
each bit in symbols, i.e., Xi for i > 1. We refer to elements 
in GF(2™) as symbols for m > 2 and bits for m — 1. 

A non-binary LDPC code C over GF(2™) is defined by the 
null space of a sparse M x N parity-check matrix H — {hij} 
defined over GF(2™). 

C = {xe GF(2™)^ I Hx'^ = e GF(2™)*^} 

The c-th parity-check equation for c = 1, . . . , M is written as 



hc,ixi 



+ Kmxn = G GF(2"), 



where /ic,i, . . . , /ic,jv e GF(2'") and xi, . . . , xjv G GF(2™). 

The binary LDPC codes are represented by Tanner graphs 
with variable and check nodes [10 pp. 75]. The non-binary 
LDPC codes, in this paper, are also represented by bipartite 
graphs with variable nodes and check nodes, which are also 
referred to as Tanner graphs. For a given sparse parity-check 
matrix H = {hcv} over GF(2'"), the graph is defined as 
follows. The w-th variable node and c-th check node are 
connected if hcv / 0. By v — 1, . . . ,N and c = 1, ■ • ■ , M, 
we also denote the ?;-th variable node and c-th check node, 
respectively. 

A non-binary LDPC code with a parity-check matrix over 
GF(2™) is called (dv, (ic)-regular if all the columns and all 
the rows of the parity-check matrix have weight d^ and dc, 
respectively, or equivalently all the variable and check nodes 
have degree dy and d^, respectively. Let Ci be a (2, 3)-regular 
LDPC pre-code defined over GF(2™) of length N symbols 
or equivalently mN bits and of rate 1/3. It can be seen that 
N = 3k/m. The pre-code Ci has a 2A^/3 x N sparse parity- 
check matrix H ~ {hij} over GF(2™). The matrix H has 
row weight 3 and column weight 2. Fig. [T] shows the Tanner 
graph of Ci of length iV =18 symbols. It can be shown that 
(2, (ic)-regular non-binary LDPC codes is linear-encodable by 
using a non-singular zig-zag subgraph. 

We define a new fountain code Coo ■ GF(2)'= ^ GF(2)°" 
by giving the encoding procedure as follows. 

1) First, map the k information bits to k/m information 
GF(2™)-symbols. 

2) By the pre-code Ci, encode the k/m information sym- 
bols to N symbols xi, . . . , a;^ G GF(2™) . 



3) Repeat the followings endlessly from i = 1 to oo. 

a) Pick randomly Vi G [1, A^], Wi G [l,m] and hi G 
GF(2")\{0}. 

b) Transmit w^-th bit of h,Xy^ G GF(2"'). 

The proposed fountain code Coo can be viewed as a non- 
binary Raptor code with a non-binary (2,3)-regular LDPC 
pre-code and an output degree distribution f2(x) — x [2|. 
Note that 17 (x) — x does not mean simple repetition of bits 
but multiplicative repetition of symbols in GF(2™) for the 
proposed non-binary setting. 

III. Decoding Scheme 

We assume that transmission takes place over the MBIOS 
channel. Specifically, the channel is specified by the transition 
probability P{-\-) such that P{y\x) = Pr(y == y\X = x) 
where X and Y are the random variable of an input bit x and 
the channel output y, respectively. And we assume that the 
information bits are chosen with uniform probability. 

The most important feature of the fountain coding system 
is that the decoder does not receive all the channel output 
but collects n channel outputs. The decoder recovers the k 
information bits from the n collected channel outputs. The 
overhead e is defined Q, iHJ by 

e^C/R-l, R = k/n, 

where C is the channel capacity. Then, the decoder has 
n = (1 + e)k/C collected channel outputs. Note that, in the 
original setting of fountain codes as in ||l],||2|, the capacity 
is set C — 1, i.e., all the collected bits are uncorrupted. The 
aim of the fountain coding in this paper is to reliably recover 
the information bits with small overhead. The overhead e = 
implies that the information bits are transmitted at rate R — C, 
which is our extreme aim. With infinitely many information 
bits. Raptor codes can achieve e = for the channel with 
C — 1, i.e., the BEC. And Raptor codes optimized for the 
BEC exhibit a quite good performance for large information 
bits with k = 65536. However, for both the BEC and the 
general MBIOS channels with C < 1, Raptor codes exhibit 
high error floors [3], [llj, liT2l . ID for small information bits 
with k - 1024. 

For the i-th transmitting bit, the sender picked randomly 
V, G [l,iV], w^ G [1,™] and h, G GF(2"')\{0} and 
transmitted Wi-th bit of hiXy, G GF(2™). Let / be the set 
of transmitting indices that the receiver collected. It follows 
#/ = n. In other words, for i G /, the receiver collects yi that 
is the corrupted version of the i-th transmitted bits. We assume 
that the decoder knows not only yi but also the indices Vi, Wi 
and the multiplicative coefficients hi for i G /. In practice, this 
is realized by embedding the indices in the header of packets 
or synchronization between the sender and the receivers ["21. 

The proposed code Coo can be decoded by the sum-product 
(SP) decoding algorithm on the Tanner graphs. The SP decoder 
for the non-binary LDPC codes exchanges probability vectors 
in G M^ , called messages, between variable nodes and check 
nodes lfT3l . An example of the Tanner graph used by the 
decoder is shown in Fig. |2] The variable nodes of degree 




Fig. 2. An example of a Tanner graph used for decoding. Some variable 
nodes are of degree one. The variable nodes of degree one are con'esponding 
to the transmitted symbols whose channel outputs are collected by the decoder 
White dots represent bits corresponding to the received channel outputs. It can 
be seen that the decoder collected 22 channel outputs for this example. 



one with white dots in Fig. |2] represent collected channel 
outputs. If the SP decoding algorithm is immediately applied 
to the proposed codes, all the variable nodes and check nodes, 
including the variable nodes of those multiplicative repetition 
symbols, are activated, i.e. exchage the messages. However, 
the messages reached at the variable nodes of degree one do 
not change messages that sent back from the nodes. Therefore, 
after the initialization, the decoder does not need to pass the 
messages all the way to those variable nodes of degree 1 
and their adjacent check nodes of degree 2. Consequently, 
the decoder uses only the Tanner graph of the pre-code Ci. 
It follows that the complexity of the decoding algorithm does 
not depend on the number n of collected channel outputs and 
the channel capacity C. In contrast the decoding complexity 
of the conventional fountain codes largely depends on n and 
C as explained in Section |T] 

The SP decoding involves mainly 4 parts, i.e. the initializa- 
tion, the check to variable computation, the variable to check 
computation, and the tentative decision parts. Let X be the 
random variable of a transmitted bit x, and let Y be the random 
variable of the corresponding channel output y. The a posterior 
probability Q{x\y) := Pr(X — x\Y = y), for a; = 0, 1 and 
y ^ Ais assumed to be known to the decoder, where A is the 
receiving alphabet. 
initialization : 

The decoders collected n = (1 + e)k/C channel outputs, yi 
for i ^ I, where #/ — n. Define I^ :— {i ^ I \ Vi — v}. It 
follows that / = U^=ilv For each variable node v in Ci for 
v^l,...,N, calculate p^°\x) for x g GF(2™) as follows. 

Q{0\yi) if the Wi-th bit of a; is 
Q(l|2/i) if the Wi-th bit of a; is 1, 



i^\^ 



where f 



is the 

(0), 



normalized factor such that 
1. Each variable node v — 1, . . . ,N 



in Ci sends the initial message p^J = p\, 



to each 



adjacent check node c. Set the iteration round as t :— 0. 

check to variable output ; 

For each check node c = 1, . . . , Af in Ci, let T4 be the set of 
the adjacent variable nodes. It holds that #14 = 3, since the 
pre-code Ci is (2, 3)-regular Each c has 3 incoming messages 



Pvc for V eVc from the 3 adjacent variable nodes. The check 
node c sends the following message pcv G M^ to each 
adjacent variable node v G Vc- 

pi'K^) = pi'KKc^) for ^ e GF(2™), 

P^+^\x) = p^i+^\K,x) for X e GF(2"). 

where pi® P2 € K'^ is convolution of pi G M? and p2 G 
M? . To be precise, 

{Pi®P2){x)^ Y, Pi{y)P2{z)foyxeG¥{2-^). 

V,z£GF(2"-) 
x=y+z 

The convolution seems the most complex part of the decoding. 
Indeed, the convolutions are efficiently calculated via FFT 
and IFFT lfT4l . lfT3ll . Increment the iteration round as £ := £+1. 



variable to check output : 

Each variable node u = 1, . . . , iV in Ci has 2 adjacent check 
nodes since the pre-code Ci is (2, 3)-regular. Let C„ be the 
set of adjacent check nodes. The message piJ G M^ sent 
from ti to c G C„ is given by 

pW (x) = p(") (x) n Pc'l (^) fo-- ^ 6 GF(2'") . 

c'GC„\{c} 

tentative decision 

For each v = 1, . . . ,N, the tentatively estimated w-th trans- 
mitted symbol is given as 



?.W - 



argmax Y[ pI^H^)p''cv (x) . 



xeGF{2'^)^,^^^ 

If i;^^^ :— {x\ , . . . , Xpj ) forms a codeword of Ci, i.e. x'^^^ 
satisfies every parity-check equation of Ci 






M ^ 



G GF(2") 



for all c = 1, . . . , M, the decoder outputs x'--^^ as the estimated 
codeword. Otherwise repeat the check to variable, variable to 
check and tentative steps. If the iteration round £ reaches at 
a pre-determined number, the decoder collects more channel 
outputs and start over the decoding. 

IV. Analysis of Asymptotic Overhead 

In this section, we investigate the overhead e in the limit of 
many information bits fc ^' oo for the transmissions over the 
BEC, i.e., C = 1. Rathi developed the density evolution which 
enables the prediction of the decoding performance of the non- 
binary LDPC codes in the limit of large code length. The 
density evolution usually gives, for a given code ensemble, the 
maximum channel erasure probability, referred to as threshold, 
at which the average decoding erasure probability goes to zero. 
We will use the density evolution calculating the maximum 
overhead e at which the average decoding erasure probability 
goes to zero in the limit of fc ^- oo. 



The density evolution used in this section was originally 
developed for the non-binary LDPC code ensembles with 
parity-check matrices defined over the general linear group 
GL(GF(2), to). However, Rathi reported that the threshold for 
the code ensemble defined over GF(2'") and GL(GF(2),m) 
also have approximately the same threshold within the order 
of 10^^. Consequently, we shall evaluate the threshold of the 
proposed codes by the density evolution for GL(GF(2),to,). 

In the binary case, we can predict the asymptotic decoding 
performance of LDPC codes transmitted over the general 
MBIOS channels in the large code length limit by density 
evolution ITSJI . Density evolution is possible also for the non- 
binary LDPC codes lfT6l but computationally intensive and 
tractable only for the BBC. The analysis for the EEC often 
helps us to capture the universal properties of LDPC codes. 

When the transmission is taken place over the EEC and 
all-zero codewords are assumed to be sent, the messages, 
described by probability vectors {p{x))xeGF{2"^) of length 2™ 
in general, can be reduced to linear subspaces of GF(2)'" ifTSl . 
To be precise, for each message in the SP decoding algorithm, 
a subset of {x e GF(2)™ | p{x) ^ 0} forms a Hnear 
subspace of GF(2)™, where x is the binary representation 
of a; e GF(2"). 

For messages in SP decoding, probability vectors P = 
{Pq, . . . , Pm) are used for the density evolution and referred 
to as densities. The i-th entry Pi is the probability that a 
message forms a subspace of dimension i for i — 1, . . . , m. 
Define two densities P}^' and Q^^' as the densities of messages 
sent from variable nodes and check nodes at the l-ih iteration 
round, respectively. In |17|, Rathi proved that the density 
that outgoing messages from a variable (resp. check) node 
of degree 3 with two incoming messages of density P and 
Q is given by P □ Q (reps. P_M Q). The detail calculation 
of P □ Q and P Kl Q are defined ETin below. Using these 2 
operations of 2 densities, the density evolution in ifTTl gives 
recursive update equations of P}^' and Q^ '' for ^ > 0. 

Rathi |fT3l developed the density evolution for the EEC that 
tracks probability densities of the dimension of the linear sub- 
spaces. For ^ > 0, the density evolution tracks the probability 
vectors P' -' and Q^ ' which are referred to as densities. The 
initial messages in Eq. ([1]) can be seen as the intersection of 
d subspaces of the messages received as the channel outputs. 

With e overhead, the decoder has k{l+e)/C channel outputs 
transmitted over the channel with capacity C. The number of 
variable node in Ci is A^. It holds that A^ = 3(iV-Af ) = 3mk, 
since Ci is of rate 1/3 and defined over GF(2'"). The average 

C^{m,k,i,j) := 2('-'=)(-'-*) 
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n 



2*^ - 2' 



is a 2-Gaussian binomial. 



number of collected channel outputs per variable node in Ci is 
given by (1 + e)/(3TO,) =: (3. It follows hat the probability Rd 
that a randomly chosen variable node in C has d corresponding 
channel outputs is given by 



Rd — 



-I 



N-d 



It follows that 



Y.Rdx'' 

d>0 



N 



-^ 



N 



d>0 



d\ 



(2) 



From this, we see the probability that a randomly chosen 
variable node in C has d corresponding channel outputs in the 
limit of k 



oo is ^^—j, — . The density of the initial messages 



is given by P*-"-* as follows, 

(3'^e-'^ 



P 



(0)^^. 
d>0 



d! 



d times 

EB---BE, 



where S is a density such that the subspace is of dimension 
771—1 with probability 1. In precise, E_ := {Eq, . . . , £',„), 



E,:= 



if 7 = 771 — 1 
if 7 7^ 777 — 1. 



Since the pre-code is a (2,3)-regular LDPC codes, we have 
recursive update equations of densities as follows. 

q(«+1) = pW ^ pW^ p{^+l) = p(0) g q{^+i). 

Since the messages of dimension corresponds to the success- 
ful decoding, the asymptotic overhead e* is defined as follows. 



sup {e G [0, 1] 

e6[0,ll 



lim P 



it) _ 



!}• 



It follows that, in the limit of many information bits fc — > cxo, 
with overhead e < e* the reliable transmissions are possible 
with the proposed Coo- 

Table J] shows the asymptotic overhead e* of the proposed 
code Coo over GF(2") for different 777 == 1, . . . , 19. Table |I] 
also lists the asymptotic overheads with (2,(ic)-regular non- 
binary LDPC pre-code for dc=4, 5 and 6. It can be seen that 
the best overhead e* — 0.079 is attained at dc = 3 and 777 = 9 
and the fountain code Coo exhibit very poor overhead if defined 
on GF(2'") with to = 1, i.e,. the binary field. We will use 
TO = 8, for its good asymptotic overhead e* = 0.081 in Tab. H] 
and friendliness for byte-oriented processors. 

V. Numerical Results 

In this section, we present demonstrations of Coo defined 
over GF{2*) with small and moderate information bits. Trans- 
mission over the EEC and the AWGN channels are considered. 
Fig. [3] shows the histograms of overheads of Coo defined 
over GF(2*). It seems that the asymptotic overhead is getting 
concentrated at 0.081 as predicted in Section |IV] Fig. |4] 



TABLE I 

Asymptotic overhead e* of the proposed codes Coo with a 

PRE-CODE (2, dc)-REOULAR non-binary LDPC CODES OVER GF(2'") 
TRANSMITTED OVER THE BEC, I.E., C = 1.0. 
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Fig. 4. Decoding performance of tlie proposed fountain codes for the binary- 
input AWGN channels with capacity C=1.0, 0.5 and 0.1, The information 
size is fc = 1024. The performance of best-so-far Raptor codes 1111 . 1121 . 
f4l optimized for k = 1024 are drawn for comparison. 
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Fig. 3. Histograms of the overheads at which the proposed fountain 
code over GF(2*) successfully recovers k information bits over the chan- 
nel with C = 1.0. The number of the information bits is set k = 
192, 512, 1024, 2048, 8192, 16384 and 32768 from the top to bottom. The 
horizontal axis describes the overhead e. It can be seen that it is getting 
concentrated at overhead 0.081 as predicted in Tab. U at m = 8. 



shows the decoding performance of the proposed fountain 
code transmitted over the binary-input AWGN channels with 
capacity C = 1.0,0.5 and 0.1. The horizontal axis describes 
the overhead and the vertical axis describes the block error 
rate. The proposed codes exhibit the better performance than 
the best-so-far Raptor codes. 

VI. Conclusion 

In this paper we propose a new simple fountain coding 
scheme whose decoding complexity does not depend on the 
number of collected channel outputs. No optimization of the 
output degree distribution is needed. Because of the non- 
binary property, we believe the proposed codes can be used 
for memory channel. 
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